Identification and Molecular Characterization of Cotton
Leaf Curl Begomovirus Complex Infecting Cotton in Baluchistan, Pakistan
Kamran Rashid1, Mohsin Tariq1, Ioly Kotta-Loizou2,
Muhammad Ashraf1, Shabnum
Shaheen3 and Khadim Hussain1*
1Department of Bioinformatics and
Biotechnology, Government College University Faisalabad, Pakistan
2Department of Life Sciences,
Faculty of Natural Sciences, Imperial College London, United Kingdom
3Department of Botany, Lahore
College for Women University Lahore, Pakistan
*For correspondence: hussaink@gcuf.edu.pk
Received 03 October 2022;
Accepted 13 February 2023; Published 17 March 2023
Abstract
Cotton
leaf curl disease (CLCuD) is a serious threat to cotton productivity throughout
the world, caused by whitefly transmitted single stranded DNA viruses belonging
to the genus Begomovirus. Typical begomovirus disease symptoms were
observed on cotton crop in 2017, in the Barkhan district, Baluchistan province,
Pakistan. Symptomatic leaves were sampled and subjected to PCR amplification
using both universal and specific primers for begomovirus and their
satellites. The amplicons were cloned and sequenced; analysis of the resulting
full-length viral sequences identified the begomovirus as a strain of cotton leaf curl Multan virus
(CLCuMuV), which was associated with the cognate cotton leaf curl Multan betasatellite (CLCuMuB), and two
alpha-satellite molecules cotton leaf
curl Multan alphasatellite (CLCuMA) and okra leaf curl alphasatellite (OLCA). To the best of our
knowledge, this is the first report of the begomovirus-satellite complex
causing CLCuD in the Baluchistan province of Pakistan. © 2023 Friends Science Publishers
Keywords:
Begomoviruses; Beta-satellites; Alpha-satellites; Cotton leaf curl Multan virus (CLCuMuV); Cotton leaf curl Multan beta-satellite (CLCuMuB); Cotton leaf curl Multan alphasatellite (CLCuMA)
Introduction
Cotton (Gossypium hirsutum) crop
is a very important agricultural commodity and the export of cotton fibre and
cotton goods play a critical role in the agriculture-based economies of many
cotton-growing countries, including India and Pakistan. Cotton
leaf curl disease (CLCuD) has a
significant impact on cotton production in Pakistan and northern India (Sattar et
al. 2013; Uniyal et al. 2019). This devastating disease was noted
for the first time in late 1960s, close to the city of Multan, Pakistan
(Hussain and Mahmood 1988; Ali et al. 2019) and quickly spread to
practically all cotton-growing districts in the Punjab province and the
north-western areas of India near the Pakistan border. Although CLCuD attracted
minor attention initially, during the 1990s it emerged as an epidemic and
caused major yield loss of cotton. CLCuD not only reduced the yield of cotton
but also negatively impacted its quality-determining characteristics, such as
fitness of fibre, length of staple, bundle strength of fibre, etc. This was due to compositional
changes in fibre components including cellulose, protein, pectin, and wax
(Farooq et al. 2011; Monga and Sain 2021). Cotton breeders established
certain CLCuD-tolerant types in the late 1990s, through traditional breeding
and selection, and cotton output in Pakistan was returned to pre-epidemic
levels. Unfortunately, CLCuD symptoms reappeared in these tolerant types during
the 2001–2002 growing season, originally near the town of Burewala in the
Punjab province, indicating that the cotton resistance had been compromised
(Mansoor et al. 2003). This marked the start of a second CLCuD epidemic,
which quickly expanded over Pakistan and India's northern territories.
While Punjab
sustained heavy losses of cotton crops during the first epidemic, other
provinces of Pakistan including Sindh, Khyber Pakhtunkhwa and Baluchistan
remained unaffected. Begomoviruses were reported in Sindh, infecting crops,
non-crop weeds, and ornamental plants (Sanz et al. 2000), but the Sindh
begomoviruses and their transmitting whitefly vectors were different from those
in Punjab (Simón et al. 2003). CLCuD was initially reported in Sindh
during 1997–1998, but losses were minimal as compared to those observed in
Punjab (Mansoor et al. 1998). Since 2003–2004, however, CLCuD has been
inflicting considerable losses in central and lower Sindh, as a result of
introducing cotton types extremely sensitive to CLCuD and not approved for
cultivation by the Pakistani government.
Baluchistan is
the largest province of Pakistan, but agriculture is limited due to the
plethora of mountainous and arid regions. Most of the land consists of dry,
barren hilly regions and is not arable; however, some valleys are very fertile
and suitable for agriculture. In Baluchistan, cultivation of cotton started in
the early 1970s and has recently expanded manifolds. For instance, only 300 ha
were used of cotton cultivation in 1981 which increased up to 41,000 ha in 2015
(Batool and Saeed 2017). Baluchistan has a high potential for cotton
cultivation. cotton cultivation is increasing in Baluchistan and numerous
districts, such as Lasbella, Jafferabad, Nasirabad, Kachhi, Sibi, Kohlu, Dera
Bugti, Barkhan, Khuzdar, Turbat (Kech), Kharan, Naushki, and Loralai, have
cotton fields where a sufficient quantity of water is available (GOB 2014;
Kakar et al. 2018). In Pakistan, CLCuD was reported initially in Punjab
and subsequently in Sindh; however, no investigations have been performed in
Baluchistan to date. In the current study, we focus on the identification and
molecular characterization of a begomovirus complex associated with CLCuD
affecting cotton crops in Baluchistan. To our knowledge, this is the report of
a cotton leaf curl begomovirus complex infecting cotton crop in the province
Baluchistan, Pakistan.
Materials and
Methods
Sample
collection, extraction of plant genomic DNA and PCR amplification
During the 2017 growing season, cotton leaves exhibiting
CLCuD symptoms were collected from different fields of the Barkhan district in
the Baluchistan province, Pakistan. Total genomic DNA was extracted using the
modified CTAB method as described in Akram et al. (2017) and subjected to
rolling circle amplification (RCA). The RCA product was used as a template for
PCR amplification with universal and sequence-specific primers, including
universal back-to-back primer pairs Begomo F/Begomo R (Shahid et al. 2007) and Beta01/Beta02 (Briddon et al. 2002; Bull et al. 2003;
Hussain et al. 2003), together with the sequence-specific primer pairs
CLCV1/CLCV2 and DNA101/DNA102.
Cloning and sequencing of amplified product
The amplicons were cloned in the pGEM®-T Easy Vector and
Sanger sequencing of both strands was performed by Macrogen Inc. Korea using
primer walking. The complete genome sequence of the virus, including
beta-satellite and alpha-satellite molecules, was assembled using Lasergene
(DNA-Star Inc., Madison, WI, USA) and submitted to the GenBank database
(accession numbers MW594402, MW603840, MW645089 and MW645090).
Sequence analysis
Similar sequences were retrieved from public databases,
such as GenBank, using the Basic Local Alignment Search Tool (BLAST; Zhang et al. 2000) and multiple
sequence alignments were performed using Clustal W, as implemented by Molecular
Evolutionary Genetics Analysis (MEGA) version 7.0 (Kumar et al. 2016). Pairwise sequence comparisons were done
using MegAlign, as implemented by Lasergene, and Sequence Demarcation Tool
(SDT; Muhire et al. 2014).
Phylogenetic analysis was conducted by constructing a phylogenetic tree using
the neighbor-joining algorithm, as implemented by MEGA 7.0 with 1,000 bootstrap
replicates. Virus species names were retrieved from ICTV (http://www.ictvonline.org/virustaxonomy.asp)
and abbreviations of begomovirus names and their satellites are shown as
previously (Varsani et al. 2017).
The online National Centre for Biotechnology Information (NCBI) tool ORF-finder
was used to identify open reading frames (ORFs) in the sequence of the virus
and the related satellites (https://www.ncbi.nlm.nih.gov/orffinder/).
Results
Sanger
sequencing of cloned genomic components revealed that the begomovirus
(CT4-begomo clone, MW594402) was 2738 bp, the two CT13 alpha-satellites
(CT13-alpha clone, MW645089 and CT45-alpha,
MW645090) were respectively 1370 bp and
1371 bp and the beta-satellite (CT4-beta clone, MW603840)
was 1358 bp in length. For these and similar sequences retrieved from public
databases, pairwise distance matrices were constructed using MegAlign and SDT and phylogenetic analyses were
performed using MEGA.
Pairwise distance matrix (Fig. 1a) and phylogenetic analysis (Fig. 1b)
of the CT4-begomo complete nucleotide sequence confirmed its identity as a
CLCuMuV strain since it grouped together with other CLCuMuV strains from
Pakistan and India and showed 98.7% sequence similarity with CLCuMuV strains
(MG373551 and MG373556) isolated from hollyhock plants (Alcea rosea) in
New Delhi, India (Kumar et al. 2020).
CT13-alpha and CT45-alpha were
respectively revealed as strains of okra
leaf curl alphasatellite (OLCuA) and cotton leaf curl Multan alphasatellite
(CLCuMuA). Pairwise distance matrix analysis
(Fig. 2a) showed that CT13-alpha had 78–99.5% similarity to other alpha-satellite
molecules found in Pakistan: 99.5% with OLCuA
(LN811059), 99.4% with Ageratum
enation alphasatellite (AEA; MH510295,
MH510296) and 96% with Ageratum conyzoides symptomless
alphasatellite (ACSLA; KX656850) sequences. Similarly, CT45-alpha had
82.7–98.7% similarity with different isolates of CLCuMuA (MK357286, MT966813
and MT966814), found in different districts of Punjab, Pakistan. Phylogenetic
analysis (Fig. 2b) also illustrated that CT13-alpha grouped with different
isolates of OLCuA, AEA and ACSLA, while
CT45-alpha grouped with different isolates of CLCuMuA.
Fig. 1: (a) Pairwise distance matrix of CT4-begomo clone
(MW594402_CLCuMuV_PK) sequence aligned by CLUSTAL W using sequence demarcation
tool (SDT). The % identity of CT4-begomo clone with similar sequences from
public databases are shown. (b)
neighbor-joining phylogenetic tree constructed based on the alignment of
CT4-begomo with closely related begomovirus sequences. CT4-begomo clone which
was obtained in the current study is highlighted in green. The sequence of
maize streak virus (MSV) serves as the outgroup and is highlighted in orange
Fig. 2: (a) SDT pairwise distance matrix of isolated 2
alpha satellites (MW645089_CT13_alpha and MW645090_ CT45_alpha) sequences which
were aligned by CLUSTAL W tool in SDT. The % identity of isolated
alpha-satellites molecules with similar sequences from public databases are
shown. (b)
neighbor-joining phylogenetic tree constructed based on the alignment of
CT13-alpha and CT45-alpha with closely related alpha-satellite sequences.
CT13-alpha and CT45-alpha are highlighted in green. The sequence of CLCuMuB
serves as the outgroup and is highlighted in orange
CT4-beta was revealed as a strain of cotton leaf curl Multan betasetallite
(CLCuMuB). Pairwise distance matrix (Fig. 3a) and phylogenetic analysis (Fig.
3b) showed up to 99.34% similarity with several other strains of CLCuMuB
(LT549459, KT228323,
KY52352, LT549459, MK357271, MK357276 and MK357283) found in Pakistan and India. CT4-beta was
revealed as a strain of cotton leaf
curl Multan betasetallite (CLCuMuB). Pairwise distance matrix (Fig. 3a)
and phylogenetic analysis (Fig. 3b) showed up to 99.34% similarity with several
other strains of CLCuMuB (LT549459, KT228323,
KY52352, LT549459, MK357271, MK357276 and MK357283) found in Pakistan and
India.
Table
1: Positions and coding capacity of predicted
genes and ORFs percentage sequence identity for the clone of CT4 full genome
virus, CT4 beta-satellite, CT13, CT45 alpha-satellite molecules isolated from
cotton Baluchistan
Name |
Strand |
Frame |
Start----Stop |
Length nt/aa |
Highest Percentage similarity
(Virus name and Accession number) |
CT4 clone full genome virus |
|||||
Rep/C1 |
-ve |
3 |
2583-1495 |
1089/362 |
97.15% (CLCuMuV (KX656795) |
TrAP/ C2 |
-ve |
1 |
1598-1146 |
453/150 |
99.56% (CLCuMuV (MG373556) |
REn/C3 |
-ve |
2 |
1453-1049 |
405/134 |
99.75% (CLCuMuV (MG373556) |
C4 |
-ve |
1 |
2429-2127 |
303/100 |
98.68% (PeLCV (MN910265) |
C5 |
-ve |
1 |
791-60 |
732/243 |
100% (CLCuV (KY120361) |
Coat protein/V1 |
+ve |
3 |
276-1046 |
771/256 |
100% (CLCuV (KY120361) |
V2 |
+ve |
2 |
116-472 |
357/118 |
100% (CLCuMuV (MG373551) |
CT4 beta-satellite genome ORF |
|||||
BetaC1 |
-ve |
2 |
550-194 |
357/118 |
99.44% (CLCuMuB (MN910267) |
Replication associated protein
(Rep) |
+ve |
1 |
82-1029 |
948/315 |
99.68% (OLCuA (LN811059) |
CT45 alpha-satellite genome
ORF |
|||||
Replication associated protein
(Rep) |
+ve |
2 |
77-1024 |
948/315 |
99.22% (CLCuMuA (MK357290) |
Fig. 3: (a) Pairwise distance matrix of isolated
beta-satellite molecule (MW603840_CT4_Beta) sequence aligned by CLUSTAL W using
sequence demarcation tool (SDT). The % identity of CT4-beta molecule with
similar sequences from public databases are shown. (b) neighbor-joining phylogenetic tree constructed based
on the alignment of CT4-beta with closely related beta-satellite sequences.
CT4-beta is highlighted in green. The sequence of CLCuMuA serves as the
outgroup and is highlighted in orange
Potential gene sequences and their
hypothetical protein sequences were analyzed using the NCBI ORF finder tool. In the case
of CT4-begomo, two ORFs, V1 (coat protein) and V2 (pre-coat protein) were
identified on the virion strand, whereas five ORFs, C1, C2, C3, C4 and C5, were
identified on the complementary strand. CT13-alpha and CT45-alpha had a single
ORF encoding a replication-associated protein on the complementary strand.
CT4-beta also had a single ORF known as beta C1 (βC1) on the complementary strand. The ORFs length, their
coordinates, the number of encoded amino acids and their % homology with the
most closely related virus and satellite genes available in public databases
are described in Table 1.
Discussion
Cotton
(Gossypium spp.) has a major economic
impact in cotton-producing countries worldwide. After China, the United States,
India, and Brazil, Pakistan is the fifth-largest producer of cotton (Aslam et
al. 2022), with the average cotton yield being about 570.99 kg.hm−2.
Currently, cotton production is declining due to climate change and various
biotic stresses (Razzaq et al. 2021).
Various factors influence cotton yield, the most important being CLCuD caused
by begomoviruses. CLCuD was initially reported in 1912 in Nigeria, while in
Pakistan it was reported for the first time in 1967 in the Punjab province (Farooq et al. 2014; Ali et al. 2019)
and the Sindh province remained unaffected until 1997 (Mansoor et al. 1998, 2006). Pakistan is the most productive
country in the world in terms of research related to CLCuD and more than 217
articles have been published on this topic (Khan et al. 2020). However,
no investigations for CLCuD have been conducted and no CLCuD has been reported
in the Baluchistan province of Pakistan. Here we report for the first the
identification and characterization of CLCuMuV and its associated satellites as
a complex causing CLCuD in cotton plants collected from Baluchistan, Pakistan.
Analysis of the complete genome sequence of the begomovirus we
discovered showed maximum nucleotide sequence identity with CLCuMuV isolates
from New Delhi, India (Kumar et al. 2020).
According to Zerbini et al. (2017),
CLCuMuB is required for the development of CLCuD symptoms by the majority of
viruses that cause CLCuD. In this study, we have also identified a
beta-satellite that exhibited maximum similarity with different isolates of CLCuMuB. In addition to the beta-satellite, we
isolated two alpha-satellites from the same cotton samples. The first alpha-satellite showed maximum similarity with
different isolates of CLCuMuA, first identified in cotton plants from
central Punjab (HE965684, HE979548, HE965680, HE966424, HE979546; Mansoor et al. 1999; Siddiqui et al.
2016) and also from Burewala (misnamed as CLCuBuA in GenBank; FN658728),
Punjab, Pakistan (Hameed et al. 2014).
The second alpha-satellite showed the highest
similarity with OLCuA, AEA and ACSLA, three alpha-satellite molecules isolated
from Pakistan. OLCuA was first isolated from okra plants in Pakistan (AJ512954;
Briddon et al. 2004) and subsequently found in cotton samples from
Punjab, Pakistan (e.g., HE966420,
HE966418, HE979544, HE966417, HE966416, HE972285; Siddiqui et al. 2016). Interestingly, begomoviruses possess the
strategy of recombination and pseudo recombination, allowing them to overcome
the plant defense mechanisms for bona
fide infection. This is a serious problem as the virus can modify itself by
recombination and can evolve into a more virulent complex by capturing diverse
satellite components. According to literature, the Old-World alpha-satellites
have been demonstrated to play a role in the suppression of gene silencing
(Nawaz-ul-Rehman et al. 2010; Abbas et al. 2019), and they also
reduce disease symptoms (Luo et al. 2019; Kumar et al. 2021).
However, research on New World alpha satellites (Nogueira et al. 2021)
elucidated that their presence can enhance the severity of symptoms. Their
ability to suppress gene silencing may be one reason why these alpha satellites
are maintained by helper viruses. Furthermore, the alpha satellites interfere
with the transmission of the virus by the whitefly vector. Therefore, it can be
presumed that alpha satellites suppress the plant’s natural immunity and
provide the necessary platform for viral infection during the initial stages.
However, they might compete with the helper virus and lower its replication
during the later infection stages.
Interestingly, a recent study highlighted that the presence of a
beta-satellite impairs the maintenance of the alpha-satellite and mutations in
the CP of helper virus further reduce its titer (Iqbal et al. 2022). In
our study, two different alpha-satellites together with a beta-satellite are
found to be associated with a helper virus. The molecular and cellular
mechanisms underlying their pattern of interactions between these alpha and beta-satellites
could be an interesting research topic for the future and help in understanding
the maintenance of alpha-satellites by the helper virus.
Conclusion
In
the current study, we conclude that CLCuMuV is associated with different
alpha-satellites and a dominating cotton leaf curl Multan beta-satellite to
infect cotton. The limitation of the study is
that the samples were collected exclusively from Barkhan, an eastern district
connected to the Punjab province. More extensive studies with comprehensive
sampling from all cotton-growing districts of Baluchistan are needed to
establish the complete CLCuD-causing complex of begomoviruses in Baluchistan.
Acknowledgments
The
current work was funded by Higher Education Commission (HEC) Pakistan under the
NRPU research project No: 5594/ Punjab/NRPU/R&D/HEC/2016 awarded to Mohsin
Tariq.
Author
Contributions
KH
designed the project and studies. KR, MA and KH collected the samples and
performed the experiments. KR, KH, MT and SS analyzed data. KR, KH and IKL
wrote the manuscript. All authors read and approved the final manuscript.
Conflicts of Interest
Authors declare no conflict of interests and all authors
read and approved the manuscript and agreed to submit it in IJAB for
publication.
Data
Availability
The full genome sequences of CT4-begomo clone is under
accession number MW594402, alpha-satellites CT13 and CT45 alpha-satellites
under accession numbers MW645089 and MW645090 respectively and CT4-beta
betasatellite under accession number MW603840
in Genbank database.
Ethics Approval
The present research does not involve any animal as
experimental organism therefore approval from ethical committee was not needed.
References
Abbas Q, I Amin, S
Mansoor, M Shafiq, M Wassenegger, RW Briddon (2019). The Rep proteins encoded
by alphasatellites restore expression of a transcriptionally silenced green
fluorescent protein transgene in Nicotiana
benthamiana. Vir Dis 30:101‒105
Akram A, K Hussain, N
Nahid, Mahmood-ur-Rahman, A Nasim, S Shaheen (2017). Molecular characterization
of a begomovirus and associated satellites from cotton (Gossypium hirsutum)
from Dera Ghazi Khan district of Pakistan. J Anim Plant Sci 27:1245‒1255
Ali MA, J Farooq, A
Batool, A Zahoor, F Azeem, A Mahmood, K Jabran (2019). Cotton production in
Pakistan, pp:249‒276.
Wiley, New York, USA
Aslam AR, F Farooq,
R Ahmad, A Mirza (2022). Influence of greenhouse gas emissions and green
revolution on agriculture production in case study of Pakistan: Policy adoption.
Pak J Human Soc Sci 10:1099‒1110
Batool S, F Saaed (2017).
Mapping the Cotton Value Chain in
Pakistan: A Preliminary Assessment for Identification of Climate
Vulnerabilities and Pathways to Adaptation, p:19. https://think-asia.org/bitstream/handle/11540/7035/A-preliminary-assessment.pdf?sequence=1
Briddon RW, SE Bull,
S Mansoor, I Amin, PG Markham (2002). Universal primers for the PCR-mediated
amplification of DNA β; a molecule associated with some monopartite
begomoviruses. Mol Biotechnol 20:315‒318
Farooq A, J Farooq,
A Mahmood, A Shakeel, KA Rehman, A Batool, M Riaz, MT Shahid, S Mehboob (2011).
An overview of cotton leaf curl virus disease (CLCuD) a serious threat to
cotton productivity. Aust J Crop Sci 13:1823‒1831
GOB
(2014). Crop Reporting Services. Agriculture Statistics Government of
Baluchistan, Pakistan
Hameed U, M Zia-Ur-Rehman, HW Herrmann, MS Haider, JK Brown
(2014). First report of Okra enation leaf curl virus and associated cotton leaf
curl Multan beta-satellite and cotton leaf curl Multan alphasatellite infecting
cotton in Pakistan: A new member of the cotton leaf curl disease complex. Plant Dis 98:1447‒1447
Hussain
T, T Mahmood (1988). A note on leaf curl disease of cotton. Pak Cottons
32:248‒251
Iqbal Z, M Shafiq,
RW Briddon (2022). Cotton leaf curl Multan betasatellite impaired ToLCNDV
ability to maintain cotton leaf curl Multan alphasatellite. Braz J Biol
84:e260922
Khan A, D Khan, F Akbar (2020). Bibliometric analysis of
publications on research into cotton leaf curl disease. Discoveries 8:e109
Kumar M, F Zarreen,
S Chakraborty (2021). Roles of two distinct alphasatellites modulating
geminivirus pathogenesis. Virol J 18:249
Kumar M, RV Kumar, S Chakraborty (2020).
Association of a begomovirus-satellite complex with yellow vein and leaf curl
disease of hollyhock (Alcea rosea) in
India. Arch Virol 165:2099‒2103
Luo C, ZQ Wang, X Liu, L Zhao, X Zhou, Y Xie
(2019). Identification and analysis of potential genes regulated by an
alphasatellite (TYLCCNA) that contribute to host resistance against tomato
yellow leaf curl China virus and its betasatellite (TYLCCNV/TYLCCNB) infection
in Nicotiana benthamiana. Viruses
11:442-462
Mansoor S, I Amin, S
Iram, M Hussain, Y Zafar, KA Malik, RW Briddon (2003). Breakdown of resistance
in cotton to cotton leaf curl disease in Pakistan. Plant Pathol 52:784‒784
Mansoor S, M Hussain, SH Khan, AB Leghari, WA Siddiqui,
KA Malik, Y Zafar, GA Panwar, A Bashir (1998). Polymerase chain reaction-based
detection of cotton leaf curl and other whitefly-transmitted geminiviruses from
Sindh. Pak J Biol Sci 1:39‒43
Monga D, SK Sain (2021). Incidence and severity of
cotton leaf curl virus disease on different BG II hybrids and its effect on the
yield and quality of cotton crop. J Environ Biol 42:90‒98
Muhire BM, A Varsani, DP Martin (2014). SDT: A virus
classification tool based on pairwise sequence alignment and identity
calculation. PLoS One 9:e108277
Nawaz-ul-Rehman MS,
N Nahid, S Mansoor, RW Briddon, CM Fauquet (2010). Post-transcriptional gene
silencing suppressor activity of two non-pathogenic alphasatellites associated
with a begomovirus. Virology 405:300‒308
Nogueira AM, MB Nascimento, TM Barbosa, AF
Quadros, JP Gomes, AF Orílio, DR Barros, FM Zerbini (2021). The Association
between New World Alphasatellites and Bipartite Begomoviruses: Effects on
Infection and Vector Transmission. Pathogens 10:1244–1261
Uniyal AP, SK Yadav,
V Kumar (2019). The CRISPR–Cas9, genome editing approach: A promising tool for
drafting defense strategy against begomoviruses including cotton leaf curl
viruses. J Plant Biochem Biotechnol 28:121‒132
Zhang Z, S Schwartz,
L Wagner, W Miller (2000). A greedy algorithm for aligning DNA sequences. J
Comput Biol 7:203‒214